23 research outputs found

    Nonparametric Regression via StatLSSVM

    Get PDF
    We present a new MATLAB toolbox under Windows and Linux for nonparametric regression estimation based on the statistical library for least squares support vector machines (StatLSSVM). The StatLSSVM toolbox is written so that only a few lines of code are necessary in order to perform standard nonparametric regression, regression with correlated errors and robust regression. In addition, construction of additive models and pointwise or uniform confidence intervals are also supported. A number of tuning criteria such as classical cross-validation, robust cross-validation and cross-validation for correlated errors are available. Also, minimization of the previous criteria is available without any user interaction

    A kernel-based integration of genome-wide data for clinical decision support

    Get PDF
    ABSTRACT : BACKGROUND : Although microarray technology allows the investigation of the transcriptomic make-up of a tumor in one experiment, the transcriptome does not completely reflect the underlying biology due to alternative splicing, post-translational modifications, as well as the influence of pathological conditions (for example, cancer) on transcription and translation. This increases the importance of fusing more than one source of genome-wide data, such as the genome, transcriptome, proteome, and epigenome. The current increase in the amount of available omics data emphasizes the need for a methodological integration framework. METHODS : We propose a kernel-based approach for clinical decision support in which many genome-wide data sources are combined. Integration occurs within the patient domain at the level of kernel matrices before building the classifier. As supervised classification algorithm, a weighted least squares support vector machine is used. We apply this framework to two cancer cases, namely, a rectal cancer data set containing microarray and proteomics data and a prostate cancer data set containing microarray and genomics data. For both cases, multiple outcomes are predicted. RESULTS : For the rectal cancer outcomes, the highest leave-one-out (LOO) areas under the receiver operating characteristic curves (AUC) were obtained when combining microarray and proteomics data gathered during therapy and ranged from 0.927 to 0.987. For prostate cancer, all four outcomes had a better LOO AUC when combining microarray and genomics data, ranging from 0.786 for recurrence to 0.987 for metastasis. CONCLUSIONS : For both cancer sites the prediction of all outcomes improved when more than one genome-wide data set was considered. This suggests that integrating multiple genome-wide data sources increases the predictive performance of clinical decision support models. This emphasizes the need for comprehensive multi-modal data. We acknowledge that, in a first phase, this will substantially increase costs; however, this is a necessary investment to ultimately obtain cost-efficient models usable in patient tailored therapy

    L2-norm multiple kernel learning and its application to biomedical data fusion

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as <it>L</it><sub>∞</sub>, <it>L</it><sub>1</sub>, and <it>L</it><sub>2 </sub>MKL. In particular, <it>L</it><sub>2 </sub>MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing <it>L</it><sub>∞ </sub>MKL method. In real biomedical applications, <it>L</it><sub>2 </sub>MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources.</p> <p>Results</p> <p>We provide a theoretical analysis of the relationship between the <it>L</it><sub>2 </sub>optimization of kernels in the dual problem with the <it>L</it><sub>2 </sub>coefficient regularization in the primal problem. Understanding the dual <it>L</it><sub>2 </sub>problem grants a unified view on MKL and enables us to extend the <it>L</it><sub>2 </sub>method to a wide range of machine learning problems. We implement <it>L</it><sub>2 </sub>MKL for ranking and classification problems and compare its performance with the sparse <it>L</it><sub>∞ </sub>and the averaging <it>L</it><sub>1 </sub>MKL methods. The experiments are carried out on six real biomedical data sets and two large scale UCI data sets. <it>L</it><sub>2 </sub>MKL yields better performance on most of the benchmark data sets. In particular, we propose a novel <it>L</it><sub>2 </sub>MKL least squares support vector machine (LSSVM) algorithm, which is shown to be an efficient and promising classifier for large scale data sets processing.</p> <p>Conclusions</p> <p>This paper extends the statistical framework of genomic data fusion based on MKL. Allowing non-sparse weights on the data sources is an attractive option in settings where we believe most data sources to be relevant to the problem at hand and want to avoid a "winner-takes-all" effect seen in <it>L</it><sub>∞ </sub>MKL, which can be detrimental to the performance in prospective studies. The notion of optimizing <it>L</it><sub>2 </sub>kernels can be straightforwardly extended to ranking, classification, regression, and clustering algorithms. To tackle the computational burden of MKL, this paper proposes several novel LSSVM based MKL algorithms. Systematic comparison on real data sets shows that LSSVM MKL has comparable performance as the conventional SVM MKL algorithms. Moreover, large scale numerical experiments indicate that when cast as semi-infinite programming, LSSVM MKL can be solved more efficiently than SVM MKL.</p> <p>Availability</p> <p>The MATLAB code of algorithms implemented in this paper is downloadable from <url>http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html</url>.</p

    Probabilistic matrix factorization from quantized measurements

    No full text
    We consider the problem of factorizing a matrix with discrete-valued entries as a product of two low-rank matrices. Under a probabilistic framework, we seek for the minimum mean-square error estimates of these matrices, using full Bayes and empirical Bayes approaches. In the first case, we devise an integration scheme based on the Gibbs sampler that accounts also for hyperparameter and noise variance estimation. A similar technique is used also for the latter case, where we combine Gibbs sampling with the expectation-maximization (EM) algorithm to estimate the model parameters via marginal likelihood maximization. Extension to the case of missing values is also discussed. The proposed methods are evaluated on simulated data, and on a real data set for recommender systems

    OPTIMAL QUADRATURE-SPARSIFICATION FOR INTEGRAL OPERATOR APPROXIMATION

    No full text
    © 2018 Society for Industrial and Applied Mathematics. The design of sparse quadratures for the approximation of integral operators related to symmetric positive-semidefinite kernels is addressed. Particular emphasis is placed on the approximation of the main eigenpairs of an initial operator and on the assessment of the approximation accuracy. Special attention is drawn to the design of sparse quadratures with support included in fixed finite sets of points (that is, quadrature-sparsification), this framework encompassing the approximation of kernel matrices. For a given kernel, the accuracy of a quadrature approximation is assessed through the squared Hilbert-Schmidt norm (for operators acting on the underlying reproducing kernel Hilbert space) of the difference between the integral operators related to the initial and approximate measures; by analogy with the notion of kernel discrepancy, the underlying criterion is referred to as the squared-kernel discrepancy between the two measures. In the quadrature-sparsification framework, sparsity of the approximate quadrature is promoted through the introduction of an ℓ1-type penalization, and the computation of a penalized squared-kernel-discrepancy-optimal approximation then consists in a convex quadratic minimization problem; such quadratic programs can in particular be interpreted as the Lagrange dual formulations of distorted one-class support-vector machines related to the squared kernel. Error bounds on the induced spectral approximations are derived, and the connection between penalization, sparsity, and accuracy of the spectral approximation is investigated. Numerical strategies for solving large-scale penalized squared-kernel-discrepancy minimization problems are discussed, and the efficiency of the approach is illustrated by a series of examples. In particular, the ability of the proposed methodology to lead to accurate approximations of the main eigenpairs of kernel matrices related to large-scale datasets is demonstrated.status: publishe

    Tensor Learning in Multi-view Kernel PCA

    No full text
    © Springer Nature Switzerland AG 2018. In many real-life applications data can be described through multiple representations, or views. Multi-view learning aims at combining the information from all views, in order to obtain a better performance. Most well-known multi-view methods optimize some form of correlation between two views, while in many applications there are three or more views available. This is usually tackled by optimizing the correlations pairwise. However, this ignores the higher-order correlations that could only be discovered when exploring all views simultaneously. This paper proposes novel multi-view Kernel PCA models. By introducing a model tensor, the proposed models aim to include the higher-order correlations between all views. The paper further explores the use of these models as multi-view dimensionality reduction techniques and shows experimental results on several real-life datasets. These experiments demonstrate the merit of the proposed methods.status: publishe

    Probabilistic Matrix Factorization from Quantized Measurements

    No full text
    © 2017 IEEE. We consider the problem of factorizing a matrix with discrete-valued entries as a product of two low-rank matrices. Under a probabilistic framework, we seek for the minimum mean-square error estimates of these matrices, using full Bayes and empirical Bayes approaches. In the first case, we devise an integration scheme based on the Gibbs sampler that accounts also for hyperparameter and noise variance estimation. A similar technique is used also for the latter case, where we combine Gibbs sampling with the expectation-maximization (EM) algorithm to estimate the model parameters via marginal likelihood maximization. Extension to the case of missing values is also discussed. The proposed methods are evaluated on simulated data, and on a real data set for recommender systems.status: publishe

    Transductive Feature Selection Using Clustering-Based Sample Entropy for Temperature Prediction in Weather Forecasting

    No full text
    Entropy measures have been a major interest of researchers to measure the information content of a dynamical system. One of the well-known methodologies is sample entropy, which is a model-free approach and can be deployed to measure the information transfer in time series. Sample entropy is based on the conditional entropy where a major concern is the number of past delays in the conditional term. In this study, we deploy a lag-specific conditional entropy to identify the informative past values. Moreover, considering the seasonality structure of data, we propose a clustering-based sample entropy to exploit the temporal information. Clustering-based sample entropy is based on the sample entropy definition while considering the clustering information of the training data and the membership of the test point to the clusters. In this study, we utilize the proposed method for transductive feature selection in black-box weather forecasting and conduct the experiments on minimum and maximum temperature prediction in Brussels for 1-6 days ahead. The results reveal that considering the local structure of the data can improve the feature selection performance. In addition, despite the large reduction in the number of features, the performance is competitive with the case of using all features.status: publishe
    corecore